Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Nov 16, 2025

Which issue does this PR close?

Closes #2019

Closes #2670

Rationale for this change

Finish the expression serde refactor

What changes are included in this PR?

Mostly, just moving code to the serde framework.

The pattern matching for MakeDecimal changed slightly, and it no longer only matches on nullOnOverflow=true, so more plans run natively in Comet for Spark 4.0 now, fixing #2670

There is a follow-on issue #2793

How are these changes tested?

@andygrove andygrove marked this pull request as ready for review November 16, 2025 16:42
@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2025

Codecov Report

❌ Patch coverage is 84.50704% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.29%. Comparing base (f09f8af) to head (52c1272).
⚠️ Report is 708 commits behind head on main.

Files with missing lines Patch % Lines
...che/comet/serde/CometBloomFilterMightContain.scala 78.57% 2 Missing and 1 partial ⚠️
...a/org/apache/comet/serde/CometScalarSubquery.scala 75.00% 2 Missing and 1 partial ⚠️
.../org/apache/comet/serde/contraintExpressions.scala 84.21% 2 Missing and 1 partial ⚠️
.../org/apache/comet/serde/collectionOperations.scala 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2791      +/-   ##
============================================
+ Coverage     56.12%   58.29%   +2.17%     
- Complexity      976     1411     +435     
============================================
  Files           119      162      +43     
  Lines         11743    14139    +2396     
  Branches       2251     2362     +111     
============================================
+ Hits           6591     8243    +1652     
- Misses         4012     4704     +692     
- Partials       1140     1192      +52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

| `spark.comet.expression.ScalarSubquery.enabled` | Enable Comet acceleration for `ScalarSubquery` | true |
| `spark.comet.expression.Second.enabled` | Enable Comet acceleration for `Second` | true |
| `spark.comet.expression.Sha1.enabled` | Enable Comet acceleration for `Sha1` | true |
| `spark.comet.expression.Sha2.enabled` | Enable Comet acceleration for `Sha2` | true |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no new entry for UnscaledValue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

val optExpr = scalarFunctionExprToProtoWithReturnType(
"make_decimal",
DecimalType(expr.precision, expr.scale),
false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false,
!expr.nullOnOverflow,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I filed #2793 to investigate. For this PR, I wanted to just refactor and not make any functional changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what changed now. Previously, we only supported MakeDecimal in the case that nullOnOverflow=true, but now we always support it. So this PR does introduce a functional change.

// TODO PromotePrecision
// TODO KnownFloatingPointNormalized
// TODO ScalarSubquery
// TODO UnscaledValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO UnscaledValue
classOf[UnscaledValue] -> CometUnscaledValue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

val value = expr.right
val bloomFilterExpr = exprToProtoInternal(bloomFilter, inputs, binding)
val valueExpr = exprToProtoInternal(value, inputs, binding)
if (bloomFilterExpr.isDefined && valueExpr.isDefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Pattern matching is a bit more idiomatic:

Suggested change
if (bloomFilterExpr.isDefined && valueExpr.isDefined) {
(bloomFilterExpr, valueExpr) match {
case (Some(bf), Some(v)) =>


import org.apache.comet.serde.QueryPlanSerde.{exprToProtoInternal, optExprWithInfo, scalarFunctionExprToProtoWithReturnType}

class CometUnscaledValue extends CometExpressionSerde[UnscaledValue] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class CometUnscaledValue extends CometExpressionSerde[UnscaledValue] {
object CometUnscaledValue extends CometExpressionSerde[UnscaledValue] {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@andygrove andygrove changed the title chore: Refactor more expr serde chore: Finish refactoring expression serde out of QueryPlanSerde Nov 17, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove this is LGTM

: : +- * BroadcastHashJoin Inner BuildRight (29)
: : :- * Filter (13)
: : : +- * HashAggregate (12)
: : : +- * CometColumnarToRow (11)
Copy link
Contributor

@comphead comphead Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no fallback to Spark anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR added support for MakeDecimal in ANSI mode, although we still need to add tests.

import org.apache.comet.serde.ExprOuterClass.Expr
import org.apache.comet.serde.QueryPlanSerde.{createBinaryExpr, exprToProtoInternal, optExprWithInfo, scalarFunctionExprToProto}

object CometReverse extends CometScalarFunction[Reverse]("reverse") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Moved to match Spark organization.

Copy link
Member

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@andygrove andygrove merged commit 7f90fc4 into apache:main Nov 19, 2025
152 of 153 checks passed
@andygrove andygrove deleted the serde-refactor-few-more branch November 19, 2025 13:59
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Spark 4.0] MakeDecimal is not supported [EPIC] Refactor all expression serde logic out of QueryPlanSerde

5 participants